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1 Introduction 


There has been substantial research carried out on partial identihcation since the 
seminal work of Manski. For example, see monographs by Manski (2003, 2007), a 
recent review by Tamer (2010), and references therein for extensive details. In its 
general form, identihcation results are typically expressed as nonparametric bounds 
via moment inequalities or other similar population quantities. When these unknown 
population quantities are high-dimensional (e.g. the dimension of covariates is high in 
conditional moment inequalities), there is a curse of dimensionality problem in that 
a very large sample is required to achieve good precision in estimation and inference 
(see, e.g. Chernozhukov et ah (2013)). In this paper, we propose a method for 
inference that avoids the curse of dimensionality by exploiting the model structure. 
We illustrate our idea in the context of commonly used discrete choice models. 

To explain this issue, suppose that one is interested in identifying a structural 
parameter in a binary choice model. In this model, it is quite common to assume 
that an individual’s utility function is parametric while making weak assumptions 
regarding underlying unobserved heterogeneity. Specihcally, consider the following 
model 

r = i{x'(3 > 4, (1.1) 

where Y is the binary outcome, X is an observed d dimensional random vector, e: is 
an unobserved random variable, /9 G T is a vector of unknown true parameters, and 
T C M'’* is the parameter space for (3. 

Without sufficient exogenous variation from covariates, (3 is only partially iden- 
tihed. The resulting identihcation region is characterized by expressions involving 
nonparametric choice probabilities conditional on covariates. For example, under the 
assumption that the conditional median of £ is independent of X and other regularity 
conditions that will be given in Section [21 is partially identihed by 

0 = {6 G F : X'b [P{Y = 1|X) — 0.5] > 0 almost surely}. (1.2) 

Recently, Komarova (2013) and Blevins (2015) use this type of characterization to 
partially identify f3. Both papers consider estimation and inference of the identihed set 
0 using a maximum score objective function; however, they do not develop inference 
methods for the parameter value (3 based on the conditional moment inequalities in 
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fll. 2 p . Unlike theirs, we focus on inference for f3 as well as the issue of dimension 
reduction in the context of conditional moment inequalities. 

When X contains several continuous covariates yet their support is not rich enough 
to ensure point identihcation, we can, for instance, construct a conhdence region for 
(3 by inverting the test of Chernozhukov et ah (2013, henceforth CLR), who plug 
in nonparametric (kernel or series based) estimators to form one-sided Kolmogorov- 
Smirnov type statistic for testing the conditional moment inequalities. In order to 
conduct inference based on the CLR method, we need to estimate conditional expecta¬ 
tion E{Y\X) = P{Y = 1|X) nonparametrically. In this context, it is difficult to carry 
out inference in a fully nonparametric fashion when d is large. One may attempt to 
use parametric models to £t the choice probabilities. However, that can lead to mis- 
specihcation which may invalidate the whole partial identihcation approach. Hence, 
it is important to develop dimension reduction methods that avoid misspecihcation 
but improve the precision of inference, compared to fully nonparametric methods. 

In this paper, we establish an alternative characterization of 0 that is free from 
the curse of dimensionality. One of the main results of this paper (Lemma [T] in Section 
[ 2 ]) is that 0 = 0 , where 

Q = {beV-X'b [P{Y = l\X'b, X'y) - 0.5] > 0 almost surely for all 7 G L}. (1.3) 

This characterization of the identihed set 0 enables us to break the curse of dimen¬ 
sionality since we now need to deal with the choice probability conditional on only 
two indexing variables. The beneht of using the characterization in 0, as opposed 
to 0 , is most clear when we estimate the conditional expectation functions directly. 
The local power of a Kolmogorov-Smirnov type test decreases as the dimension of 
conditional variables gets large (for example, see CLR and Armstrong (2014a, 2014b, 
2015)). If the method of CLR is utilized with fll.2j) . the dimension of nonparametric 
smoothing is d. Whereas, if the same method is combined with fll.31) . note that the 
dimension of nonparametric smoothing is always 2. This is true even if d is large. 
Therefore, the latter method is free from the curse of dimensionality. 

The remainder of the paper is organized as follows. In Section [21 we provide a 
formal statement about the binary choice model fll.ljl . In Section [3l we show that 
our approach can be extended to the class of semiparametric models under which 
the sign of conditional expectation of a certain transformation of the outcome is the 
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same as that of the indexing variable. This extension covers a variety of discrete 
choice models in the literature. Section 0] describes how to construct a conhdence 
set based on CLR and Section [5] presents some results of Monte Carlo simulation 
experiments that illustrate hnite-sample advantage of using the dimension reducing 
approach. Section [6] concludes and Section contains the proofs and some further 
results. 


2 Conditional moment inequalities for a binary choice 
model 

To convey the main idea of this paper in a simple form, we start with a binary choice 
model. Recall that in the binary choice model fll.ip . we have that Y = 1{X'(3 > e}, 
where the distribution of £ conditional on X is unknown. Let Tx denote the support 
of X. Write X = (Xi,X) where X is the subvector of X excluding its hrst element. 

Let T be the parameter space that contains the true parameter vector value (3. Let b 
denote a generic element of T. Let Qr{U\V) denote the r quantile of the distribution 
of a random variable U conditional on a random vector V. We study inference of the 
model under the following assumptions. 

Condition 1. (i) |6i| = 1 for all b (ii) The distribution of Xi conditional on 

X = X is absolutely continuous with respect to the Lebesgue measure for almost every 
realization x. 

Condition 2. (i) For some r G (0,1) and for all x eTx, = x) = 0. (ii) For 

all X G Tx, there is an open interval containing zero such that the distribution of e 
conditional on X = x has a density that is everywhere positive on this interval. 

The event X'fd > e determining the choice is invariant with respect to an arbitrary 
positive scalar multiplying both sides of the inequality. Therefore, parameter scale is 
not identihed; following the literature (e.g., Horowitz (1992)), we assume Condition 
[1] (i) for scale normalization. Condition [1] (i) and (ii) together imply that the model 
admits at least one continuous covariate. Condition |2](i), due to Manski (1985, 1988), 
is a quantile independence assumption and allows for nonparametric specihcation 
of the preference shock with a general form of heteroskedasticity. Condition |2] (ii) 
implies that, for all x G Tx, P{£ < t\X = x) is strictily increasing in t around the 
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neighborhood of the point t = 0. This is a fairly weak restriction which includes yet 
is not conhned to the case where the distribution of e conditional on X has a density 
that is everywhere positive on M. 

Under Condition [21 Manski (1988, Proposition 2) established that the necessary 
and sufficient condition for point identihcation of /3 is that, for b ^ (3, 


P {X'b <0<X'/3 or X'6 > 0 > X'(3) > 0. 


( 2 . 1 ) 


Given the scale normalizing assumption, the condition fl2.ll) effectively requires that 
the covariates X should be observed with sufficient variation. Hence, lack of adequate 
support of the distribution of X may result in non-identihcation of (3. For example, 
Manski (1988) and Horowitz (1998, Section 3.2.2) constructed non-identihcation cases 
for which all covariates take discrete values. Admitting continuous covariates does 
not guarantee identihcation either. As indicated by Manski (1985, Lemma 1), non- 
identihcation also arises when the covariates are distributed over a bounded support 
such that one of the choices is observed with probability well below r for almost every 
realized value of X. In empirical applications of the discrete choice model, it is quite 
common to include continuous variables in the covariate specihcation. Therefore, the 
present paper addresses and develops the method for inference of (3 in the presence 
of continuous covariates for the model where the support of data may not be rich 
enough to fulhll the point-identifying condition fl2.ll) . 

Though Conditions [1] and [2] do not suffice for point identihcation of (3, it still 
induces restrictions on possible values of data generating preference parameters, which 
results in set identihcation of (3. To see this, note that Condition [2] implies that for 
all X e Tx, 


P(Y = 1\X = x)>t^x^I3>0, 
P{Y = 1\X = x) = T ^ x'^ = 0, 
P{Y = 1\X = x) <t^x'/3 <0. 


( 2 . 2 ) 

(2.3) 

(2.4) 


Given Condition [H X'b is continuous for any 5 G T. Thus P{Y = 1|X) = r occurs 
with zero probability. The set of observationally equivalent preference parameter 
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values that conform with Condition H] can hence be characterized by 

e = {beT:X’b [P{Y = 1\X) - r] > 0 almost surely}. (2.5) 

Given fl2.2p . fl2.3p and fl2.4p . we also have that 

0 = {6 G r : b'XX'P > 0 almost surely}. (2.6) 

Namely, the vector b is observationally equivalent to (3 if and only if the indexing 
variables X'b and X'(3 are of the same sign almost surely. 

Operationally, one could make inference on {3 by pointwise inverting a test of the 
conditional moment inequalities given in 02.51) . However, as discussed in Section (H 
there is a curse of dimensionality in nonparametric inference of the conditional ex¬ 
pectation when the dimension of continuous covariates is high. By exploiting the 
restrictions implies by Conditions [T] and [H we now present below a novel set of con¬ 
ditional moment inequalities that can equivalently characterize the set 0 yet enable 
inference to be performed free from the curse of dimensionality. 

Note that the restrictions fl 2 . 2 p . fl2.3p and 02.41) imply that 

Qi-r{Y\X) = 1{X' 13 > 0} = Qi-r{Y\X' 13) almost surely. 

In other words, we have that with probability 1, 

sgn[P(y = 1|X) - r] = sgn[P(F = l\X'(3) - r] = sgn[X'/3], (2.7) 

where sgn(-) is the sign function such that sgn(M) = 1 if u > 0 ; sgn(M) = 0 if u = 0 ; 
sgn(M) = — 1 if M < 0. The sign equivalence 02.7p motivates use of indexing variables 
instead of the full set of covariates as the conditioning variables in nonparametric es¬ 
timation of the conditional expectation, thereby breaking the curse of dimensionality 
as raised in the discussion above. To be precise, let 

0 = {6 : X'b [P{Y = l\X'b, X'y) — r] > 0 almost surely for all 7 G T}. 

The hrst key result of our approach is the following lemma showing that the identihed 
set 0 can be equivalently characterized by 0 , which is based on the choice probabilities 
conditional on two indexing variables. 
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Lemma 1. Under Conditions{I\ and\^ we have that 0 = 0. 


To explain the characterization resnlt of Lemmadl note that the model fll.ip under 
Condition [2] implies that for any 7 G T, 

sgn[P(F = 1 |X'/ 5 ,X' 7 ) — r] = sgn[X'/3] almost surely. (2.8) 

Thus, intuitively speaking, for any b that is observationally equivalent to fd, equation 

fl2.8p should also hold for b in place of fd in the statement. Dehne 

0 = {6 : X'b [P{Y = 1 |X' 7 ) — r] > 0 almost surely for all 7 G T}, (2.9) 

0 = {6 : X'b [P{Y = l\X'b) — r] > 0 almost surely}. (2-10) 

By similar arguments used in the proof of Lemma [U it is straightforward to show 
that 

0C0 = 0C0. (2.11) 

It is interesting to note that the set inclusion in fl2.1ip can be strict as demonstrated 
in the examples of Appendix IA.2I Namely, the set 0 is too restrictive and a test of 
the inequalities given by fl2.9p may inadequately reject the true parameter value jd 
with probability approaching unity. Moreover, the set 0 is not sharp and thus a test 
of inequalities given by fl 2 . 10 p would not be consistent against some b values that are 
incompatible with the inequality restrictions given by fl2.5p . 

The identifying relationship in fl2.1ip can be viewed as a conditional moment 
inequality analog of well-known index restrictions in semiparametric binary response 
models (e.g., Cosslett (1983), Powell et ah (1989), Han (1987), Ichimura (1993), Klein 
and Spady (1993), Coppejans (2001)). The main difference between our setup and 
those models is that we allow for partial identihcation as well as a general form of 
heteroskedasticty. It is also noted that to ensure equivalent characterization of the 
set 0 , we need two indices unlike ones in the point-identihed cases. 
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3 General results for a class of semiparametric mod¬ 
els under sign restrictions 

In this section, we extend the dimension reducing characterization approach of the 
previous section to a variety of semiparametric models under which the sign of con¬ 
ditional expectation of a certain transformation of the outcome is the same as that 
of the indexing variable. We treat univariate and multivariate outcome models in a 
unihed abstract setting given as follows. 

Let {Y,X) be the data vector of an individual observation where F is a vector 
of outcomes and X is a vector of covariates. The econometric model specifying the 
distribution of Y conditional on X depends on a finite dimensional parameter vector 
/9 and is characterized by the following sign restrictions. 

Assumption 1. For some set C and some known functions G and H, and for all 
c & C, the following statements hold with probability 1. That is, with probability 1, 


G{X, 

c,/5) 

> 0 < 

^E(if(F,c)|X)>0, 

(3.1) 

G(X, 

c,/3) 

= 0 < 

^E{H{Y,c)\X) = t), 

(3.2) 

G(X, 

c,/3) 

< 0 < 

^ E {H{Y,c)\X) < 0. 

(3.3) 


Let (3 be the true data generating parameter vector. Assume /9 G L where L 
denotes the parameter space. Let 6 be a generic element of L. Note that the functions 
G and H in Assumption [T] are determined by the specihcation of the given model. 

For example, for the binary choice model of Section [21 Assumption [T] is fulfilled by 
taking G{X,c,b) = X'b and H{Y,c) = Y — r, both being independent of c. Other 
examples satisfying Assumption [T] are presented below. 

Define 

©0 = {& G r : fl3.1IL 03.211 and 03.311 hold with b in place of (3 almost surely for all c G O}. 

Note that ©o consists of observationally equivalent parameter values that conform 
with the sign restrictions of Assumption [T] We impose the following continuity as¬ 
sumption. 

Assumption 2. For all c ^ G and for all 6 G F, the event that G{X, c,b) = 0 occurs 
with zero probability. 






Under Assumptions [U and [21 we can reformulate the identified set ©o using weak 
conditional moment inequalities given by the set 

0 = {6 G r : G(X, c, b)E {H{Y, c)\X) > 0 almost surely for all c G C}. (3.4) 

We now derive the equivalent characterization of the set 0 using indexing variables. 

Dehne 

0 = {6 : G{X, c, b)E{H{Y, c)\G{X, c, 6 ), G(X, c, 7 )) > 0 almost surely for all ( 7 , c) G FxC}. 

The following theorem generalizes the result of Lemma [H 

Theorem 1. Given Assumptions\^ and\B, we have that 0o = 0 = 0. 

We now discuss other examples of semiparametric models that also £t within the 
setting of sign restrictions of Assumption [H 

Example 1: Ordered choice model under quantile indepen¬ 
dence restriction 

Consider an ordered response model with K + 1 choices. Let iL + 1} denote 

the choice index set. The agent chooses alternative c if and only if 

Ac -1 <X'e + e<\c (3.5) 

where Aq = —00 < Ai < .... < \k < A^+i = C) 0 . Let A = (Ai,..., \k) be the vector of 
threshold parameters. Let Y be the observed choice. We are interested in inference 
of = ( 6 *, A). Lee (1992) and Komarova (2013) studied inference of the ordered 
response model under quantile independence restriction. Assume the distribution of £ 
conditional on X satishes Condition [21 Using this restriction, we see that Assumption 
[H holds with G = {l,...,iL}, H{Y,c) = 1{Y < c} — r and G{X,c,b) = X'Jd where 
Ac = (—A', /')' with Ic being the K dimensional vector (/c,i,..., Ic,k) such that Icj = 1 
if j = c and Icj = 0 otherwise. 
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Example 2: Multinomial choice model 

Consider a multinomial choice model with K alternatives. Let {1, denote the 

choice index set. The utility from choosing alternative j is 

Uj = X/f3 + Ej (3.6) 

where Xj G is a vector of observed choicewise covariates and Ej is a choicewise 
preference shock. The agent chooses alternative k ii Uk > Uj for all j ^ k. Let X 
denote the vector (Xi,...,X^) and Y denote the observed choice. We assume that 
the unobservables £ = (^i,..., Ek) should satisfy the following rank ordering property. 

Condition 3. For any pair {s,t) of choices, we have that with probability 1, 

Xj/3 > X'^/3 ^ P{Y = s\X) > P{Y = t\X). (3.7) 

Manski (1975), Matzkin (1993) and Fox (2007) used Condition [3] as an identifying 
restriction in the multinomial choice model to allow for nonparametric unobservables 
with unknown form of heteroskedasticity. Goeree et. al. (2005, Proposition 5, p. 359) 
showed that it suffices for Condition [3] to assume that the joint distribution of £ 
conditional on X for almost every realization of X is exchangeable and has a joint 
density that is everywhere positive on R^. 

Under Condition [31 Assumption [T] holds for this example by taking C = {{s,t) E 
{l,...,Ky :s<t}, G{X,s,t,b) = {Xs-XtYb and H{Y,s,t) = 1{Y = s}-l{U = t}. 

Example 3: Binary choice panel data with fixed effect 

Consider the following binary choice panel data model 

Yt = l{X'/] + v>Et}, te{l,...,T} (3.8) 

where Xt G R'^ is a vector of per-period covariates and v is an unobserved fixed effect. 
Let X be the vector (Xi,..., Xt). Let Y = (Yi,..., It) denote the vector of outcomes. 
Manski (1987) imposed the following restrictions on the transitory shocks Et- 

Condition 4. The distribution of Et conditional on (X, n) is time invariant and has 
a density that is everywhere positive on R for almost every realization of (X, v). 


10 


Under Condition H] and by Lemma 1 of Manski (1987), Assumption [T] holds for 
this example by taking C = {(s,t) G : s <t}, G{X, s,t,b) = (X* — Xt)'b 

and H{Y, s, t) = Ys- Yf 

Example 4: Ordered choice panel data with fixed effect 

This example is concerned with the ordered choice model of Example 1 in the panel 
data context. Let + 1} denote the choice index set. For each period t G 

{1, ...,T}, we observe the agent’s ordered response outcome Yt that is generated by 

K+l 

Yt = jT{Aj,_i < X[[^ + V + St < Aj}, (3.9) 

i=i 

where v is an unobserved hxed effect and Aq = —oo < Ai < .... < \k < A^+i = C) 0 . 
Let X and Y denote the covariate vector (Xi,..., X^) and outcome vector (Xi,..., Xr), 
respectively. Suppose the shocks St also satisfy Manski (1987)’s stationarity assump¬ 
tion given by Condition 01 Under this restriction and by applying the law of it¬ 
erated expectations, we see that Assumption [1] holds for this example by taking 
C = {(/c, s, t) : A; G {1,..., X}, (s, t) G {1,..., such that s < t}, G(X, k, s, t, b) = 
(Xt - Xjb and X(X, k, s, t) = 1{X, < k} - l{Yt < k}. 

4 The (1 — a) level confidence set 

This section describes how to construct a conhdence set a la CLR. Let v = (x, 7 , c) 
and V = {(x, 7 ,c) : x G rx ,7 G r,c G C}. Assume the set V is nonempty and 
compact. Dehne 

rribiy) = X (G(X, c, 6)i7(X, c)|G(X, c, 6 ) = G(x, c, 6 ), G(X, c, 7 ) = G(x, c, 7 )) 
y<fb,c,j (G'(x,c, 6 ),G(x,c, 7 )), 

where the function fb,c,'y denotes the joint density function of the indexing variables 
(G(X, c, b), G(X, c, 7 )). Note that for all x G V, 

nibiy) > 0 

^ X(G(X,c, 6 )X(X,c)|G(X,c, 6 ) = ^(x, c, 6 ), G(X, c, 7 ) = G'(x,c, 7 )) > 0. 
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Thus we have that 


(4.1) 


0 = {6 : mb{v) > 0 for all v G V}. 

Assume that we observe a random sample of individual outcomes and covariates 
(Yi, For inference on the true parameter value (3, we aim to construct a set 

estimator 0 at the (1 — a) conhdence level such that 


lim inf P(/9 G 0) > 1 — a. (4.2) 


We now delineate an implementation of the set estimator 0 based on a kernel 
version of CLR. Let K{.,.) denote a bivariate kernel function and hn be a band¬ 
width sequence. To estimate the function mb, we consider the following kernel type 
estimator: 

n 

mb(n) = (nh2)-'^G(W,c,6)P(y,,c)iF„(W,n,6), (4.3) 

i=l 

where 


Kn{X,,V,b)=K 


(G{x^ c, b) — G{Xi, c, b) 
\ 


G'(x,c,7) - G'(W,c,7 ) \ 
hn ) 


(4.4) 


Dehne ^ 

T{b) = inf^gv (4.5) 

crb[v) 

where 



Ui {b, c, 7) 


^ 0 b)Kl{Xi, V, b), 

i=l 


H{Y„c) 


J2Kn{Xj,{X,,^,c),b) 

- 1=1 


5^P(F„c)P„(X,,(W,7,c),6). 

1=1 


For a given value of b, we compare the test statistic T(6) to a critical value to conclude 
whether there is significant evidence that the inequalities in fl4.ip are violated for some 
n G V. By applying the test procedure to each candidate value of b, the estimator 
0 is then the set comprising those b values not rejected under this pointwise testing 
rule. 

Based on the CLR method, we estimate the critical value using simulations. Let B 
be the number of simulation repetitions. For each repetition s G we draw 
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an n dimensional vector of mntnally independently standard normally distribnted 
random variables which are also independent of the data. Let rj[s) denote this vector. 
For any compact set V C V, define 


T;(6;V)^inf,gv 


(nh^afe(n)) ^ ^ rii{s)ui {b, c, 7) G{Xi, c, b)Kn{Xi, v, b) 

i=l 


(4.6) 


We approximate the distribntion of inf^gv over V C V by that of 

the simnlated qnantity T*{b;\/). Let ^(6, V) be the p level empirical qnantile based 
on the vector (T*{b,\/))^^^^ One could use qp{b,V) as the test critical value. 

However, following CLR, we can make sharper inference by incorporating the data 
driven inequality selection mechanism in the critical value estimation. Let 


Vn{b) = {n e V : fhbiv) < -2q^^{b,V)ab{v)} , (4.7) 

where 7^ = 0.1/logn. Compared to qp{b,V), use of qa{b,Vn{b)) as the critical 
value results in a test procedure concentrating the inference on those points of v 
that are more informative for detecting violation of the non-negativity hypothesis 
on the function mb{v). In fact, the CLR test based on the set Vnib) is closely re¬ 
lated to the power improvement methods such as the contact set idea (e.g.,Linton, 
Song and Whang (2010) and Lee, Song and Whang (2014)), the generalized moment 
selection approach (e.g., Andrews and Soares (2010), Andrews and Shi (2013), and 
Chetverikov (2011)), and the iterative step-down approach (e.g., Chetverikov (2013)) 
employed in the literature on testing moment inequalities. 

Assume that 0 < a < 1/2. Then we construct the (1 — a) conhdence set 0 by 
setting 

0 = {& e r : T{b) > Ub,Vnm} ■ (4.8) 

We can establish regularity conditions under which fl4.2p holds by utilizing the general 
results of CLR. Since the main focus of this paper is identification, we omit the 
technical details for brevity. 
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5 Simulation results 


The main purpose of this simulation study is to compare hnite-sample performance 
of the approach of conditioning on indexing variables with that of conditioning on full 
covariates. We use the binary response model set forth in Section [2] for the simulation 
design. The data is generated according to the following setup: 


Y = l{X'P>s}, 


(5.1) 


where X = (Xi, ...,Xd) is a d dimensional covariate vector with d>2, and 


£ = 




k=l 


xi 


1/2 


where ^ is standard normally distributed and independent of X. Let X = {X 2 ,..., Xa) 
be a (d — 1 ) dimensional vector of mutually independently and uniformly distributed 
random variables on the interval [—1,1]. The covariate Xi is specihed by 


Xi = sgn{X2)U, 

where is a uniformly distributed random variable on the interval [ 0 , 1 ] and is in¬ 
dependent of (X,^). We set (3i = 1 and = 0 for fc G {2, ...,d}. The preference 
parameter space is specihed to be 

r = {6 G M" : 61 = 1, (62,&d) e [-1,1]"-'}. (5.2) 

Under this setup, the sign of the true index X'(5 = Xi is determined by X 2 . By 
inspecting the formulation fl 2 . 6 p . the identihed set 0 is thus given by 


0 = {6 G T : 62 > 0 and 6^ = 0 for fc G {3,..., d}}. (5.3) 


Recall that this simulation design also satishes the general framework of Section 
[3] by taking G{X, c, h) = X'b and H{Y, c) = Y — 0.5. Let Index and Full be short¬ 
hand expressions for the index formulated and full covariate approaches, respectively. 
We implement the Index approach using the inference procedure of Section HI We 
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compute the term Kn{X,v,b) using 


Kn{X,V,b) 


/ x'b — X'b \ ^ f ~ X''y \ 
\^X'b)hn) [^X'j)hn) 


where v = (x, 7), K{-) is the univariate biweight kernel function dehned by 

K{u) = ^ (1 < 1 }, 

and s'(hh) denotes the estimated standard deviation for the random variable W. The 
bandwidth sequence hn is specihed by 


h-n — Cjndex'f^ ^ 


(5.4) 


where cind.ex is a bandwidth scale. The rate considered in fl5.4p corresponds to the 
undersmoothing specification under the assumption that the true conditional expec¬ 
tation function is twice continuously differentiable. 

The Full approach is based on inversion of the kernel-type CLR test for the 
inequalities that mb,Fuii{x) > 0 for all x G Tx, where 

mb,Fuii{x) = E {X'b {Y - 0.5) |X = x) fx {x) (5.5) 


and fx denotes the joint density of X. As in the Index approach, we consider the 
kernel type estimator 


mb,Fuu{x) = {nh'^) ^ ^ X'6 {Yi - 0.5) Kn,Fuii{Xi, x), (5.6) 

i=l 


where 




Full 


hn,Full 


(5.7) 


XfuIi{ ) is the univariate pth order biweight kernel function (see Hansen (2005)), and 
hn,Fuii is a bandwidth sequence specifying by 


hn,Full — CFulin 


(5.8) 


where cfuU and r denote the bandwidth scale and rate, respectively. The test statistic 
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for the Full approach is given by 


where 


Tpuiiip) = infxgFjf 


^b,Full{x) 
0'b,Full{x) ’ 


(5.9) 


^2 

^b,Full 


{x) 


Ui^Full 


E ^IfuU (X'fif KIf.„(Xu f), 

Yi 


2=1 


n -1 


,Full (^j: ^i, 

.j=^ 


J2Y:i^n,Full{X„X,). 

i=i 


We computed the simulated CLR test critical value that also embedded the inequality 
selection mechanism. By comparing Tpuuip) to the test critical value, we constructed 
under the Full approach the conhdence set that also satishes 04.21) . 

The nominal signihcance level a was set to be 0.05. Let Qindex and QfuU denote 
the (1 — a) level confidence sets constructed under the Index and Full approaches, 
respectively. For s G {Index, Full} and for a hxed value of b, we calculated Ps{b), 
which is the simulated finite-sample probability of the event b ^ Qg based on 1000 
simulation repetitions. For each repetition, we generated n G {250, 500,1000} ob¬ 
servations according to the data generating design described above. We used 4000 
simulation draws to calculate qa{b,Vn{b)) for the Index approach and to estimate 
the CLR test critical value for the Full approach. We implemented for the Full 
approach the minimization operation based on grid search over 1000 grid points of x 
randomly drawn from the joint distribution of X. For the Index approach, the min¬ 
imization was implemented by grid search over 1000 grid points of {x, 7 ) for which x 
was also randomly drawn from the distribution of X, and 7 was drawn from uniform 
distribution on the space F and independently of the search direction in x. 

We conducted simulations for d G {3,4,5,10}. For the Full approach, both the 
bandwidth rate r and the order p of Kpuii depend on the covariate dimension. These 
were specihed to fulhll the regularity conditions for the CLR kernel type conditional 
moment inequality tests (see discussions on Appendix F of CLR(pp. 7-9, Supplemen¬ 
tary Material)). Note that for 6 G 0, Pindex (b) (PfuU (b)) is simulated null rejection 
probability of the corresponding CLR test under the Index {Full) approach, whereas 
for 6 ^ 0, it is the CLR test power. For simplicity, we computed Pindex{b) and 
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Ppuiiib) for b values specified as b = ( 6 i, 62 , •••, where bi = 1, 62 ^ {0,0.5,—!}, 
= 0 for /c G (3,d}. For these candidate values of b, we experimented over var¬ 
ious bandwidth scales to determine the value of cjndex {cpuii) with which the Index 
{Full) approach exhibits the best overall performance in terms of its corresponding 
CLR test size and power. Table 1 presents the settings of r and p and the chosen 
bandwidth scales cjndex and cpuii in the simulation. 


Table 1; Settings of r,p,cindex and cpuii 


d 

3 

4 

5 

10 

r 

11/70 

1/9 

21/220 

1/21 

p 

2 

4 

4 

6 


sample size 250 


^Index 

3.05 

3.45 

3.7 

4.1 

CPull 

2.65 

4.8 

5.6 

8.35 


sample size 500 


^Index 

2.55 

2.95 

3.05 

3.75 

CPull 

2.35 

4.3 

4.9 

8 


sample size 1000 


^Index 

2 

2.5 

2.75 

3.5 

CPull 

2.15 

3.95 

4.45 

7.7 


Tables 2 and 3 present the simulation results that compare performance of the 
Index and Full approaches. 


Table 2 : Simulated null rejection probabilities 


d 

3 

4 

5 

10 

3 

4 

5 

10 



^2 

= 0 



&2 = 

0.5 





sample 

size 

250 




P 

^ Index 

.034 

.029 

.034 

.050 

.051 

.054 

.052 

.052 

Ppull 

.031 

.043 

.046 

.050 

.050 

.053 

.052 

.055 




sample 

size 

500 




p 

^ Index 

.030 

.036 

.039 , 

.042 

.051 

.054 

.052 

.050 

Ppull 

.032 

.034 

.043 , 

.044 

.049 

.048 

.054 

.053 




sample 

size 

1000 




p 

^ Index 

.047 

.045 

.041 , 

.048 

.054 

.053 

.051 

.054 

Ppull 

.029 

.044 

.041 , 

.042 

.046 

.051 

.047 

.051 
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Table 3; Simulated test power for 62 = —1 {ratio =Pindex/Ppuii) 


d 

P 

^ Index 

PFull 

ratio 

P 

^ Index 

Ppull 

ratio 

P 

^ Index 

Ppull 

ratio 


n 

= 250 


n 

= 500 


n 

= 1000 


3 

.583 

.601 

.970 

.771 

.731 

1.05 

.927 

.828 

1.11 

4 

.541 

.530 

1.02 

.733 

.653 

1.12 

.868 

.758 

1.14 

5 

.500 

.393 

1.27 

.699 

.624 

1.12 

.806 

.738 

1.09 

10 

.409 

.216 

1.89 

.474 

.212 

2.23 

.520 

.225 

2.31 


From Table 2, we can see that all Pindex and Ppuii values in all the simulation 
cases are either below or close to the nominal level 0.05 with the maximal value being 
0.055 and occurring for the Full approach with sample size 250 under the setup of 
d = 10 and 62 = 0.5. For both methods, there is slight over-rejection for the case of 
62 = 0.5. At the true data generating value (62 = 0 ), both Pindex and Ppuii are well 
capped by 0.05 and the conhdence sets Qindex and &fuII can hence cover the true 
parameter value with probability at least 0.95 in all simulations. 

For the power of the test, we compare the Index and Full approaches nnder the 
same covariate conhguration. Table 3 indicates that power of the Index approach 
dominates that of the Full approach in almost all simulation conhgnrations. More¬ 
over, at larger sample size (n = 1000), power of the Index approach exceeds 0.8 
in almost all cases whereas that of the Full approach does so only for the case of 
d = 3. The power difference between these two approaches tends to increase as either 
the sample size or the covariate dimension increases. For the case of d = 10, it is 
noted that there is snbstantial power gain from using the Index approach. For this 
covariate specihcation, the curse of dimensionality for the Full approach is quite ap¬ 
parent because the corresponding Ppuii valnes vary only slightly across sample sizes. 
In short, the simulation results suggest that the Index approach may alleviate the 
problem associated with the curse of dimensionality and we could therefore make 
sharper inference by using the Index approach for a model with a high dimensional 
vector of covariates. 


6 Conclusions 

This paper studies inference of preference parameters in semiparametric discrete 
choice models when these parameters are not point identihed and the identihed set 
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is characterized by a class of conditional moment inequalities. Exploring the semi- 
parametric modeling restrictions, we show that the identihed set can be equivalently 
formulated by moment inequalities conditional on only two continuous indexing vari¬ 
ables. Such formulation holds regardless of the covariate dimension, thereby breaking 
the curse of dimensionality for nonparametric inference of the underlying conditional 
moment functions. We also extend this dimension reducing characterization result to 
a variety of semiparametric models under which the sign of conditional expectation of 
a certain transformation of the outcome is the same as that of the indexing variable. 

We note that moment inequalities fl3.4jl for the general framework of Section |3] can 
also be applied to monotone transformation models (e.g., see Abrevaya (1999, 2000), 
Chen (2010) and Pakes and Porter (2014, Section 2)). Hence, our dimension reducing 
approach would also be useful in that context. 

There is a growing number of inference methods for conditional moment inequali¬ 
ties. The instrumental variable approach of Andrews and Shi (2013) does not rely on 
nonparametric estimation of conditional expectation. Nevertheless, the instruments 
required to convert the conditional moment inequalities to unconditional ones increase 
with the covariate dimension. In addition to the Andrews-Shi and CLR approaches, 
other existing inference procedures include Armstrong and Chan (2013), Armstrong 
(2014a, 2015), Chetverikov (2011), Lee, Song, and Whang(2013, 2014) and Men- 
zel (2014) among others. The performance of all of these methods are related to the 
dimension of conditioning variables. Armstrong (2014b, see Tables 1 and 2) gives 
the local power properties of popular approaches in the literature and shows that 
the local power decreases as the dimension of conditional variables increases in each 
case that he considers. Thus, the curse of dimensionality problem is not limited to a 
particular test statistic. It will be an interesting further research topic to incorporate 
these alternative methods with the dimension reducing characterization result for set 
inference of the class of semiparametric models studied in Section [3] of this paper. 

A Appendix 

A.l Proofs 

Proof of LemmaUl Lemma [1] follows by applying Theorem [1] with G{X,c,b) = X'b 
and H{Y,c) = Y — t and noting that Assumptions [T] and [2] of Theorem [T] are both 
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satisfied under Conditions [U and HI 


Proof of TheoremUl By Assumptions [Hand HI the event that E {H{Y, c)\X) = 0 also 
occurs with zero probability. It hence follows that ©o = 0. Therefore, if 6 G 0, then 
with probability 1 , 


G{X, c,b)>0^E {H(Y, c) |X) > 0. 


(A.l) 


Note that 


E{H{Y, c)\G{X, c, b), G{X, c, 7 )) = E{E{H{Y, c)\X)\G{X, c, 6), G(X, c, 7 )). (A.2) 


By flA.ip . for any 7 G T, the right-hand side of flA.2p has the same sign as G{X, c, b) 
does with probability 1. Hence, 6 G 0. 

On the other hand, assume that 6 G 0. Since /9 G T, we have that G{X, c, b) 
and E{H(Y,c)\G{X,c,b),G{X,c,/3)) have the same sign with probability 1. Using 
(IA.2p and AssumptioniH we see that E{H{Y, c)\G{X, c, b), G{X, c, /?)), G{X, c, (3) and 
E{H{Y,c)\X) also have the same sign with probability 1. Therefore, we have that 
6 G 0. ■ 


A.2 Illustrating examples for non-equivalence of the sets 0, 
0 and 0 

Recall that T denotes the space of preference parameter vectors b of which the mag¬ 
nitude of the hrst element is equal to 1 . 

Example 1: 0 can be a proper subset of 0 

Let X = {Xi,X 2 ) be a bivariate vector where Xi U{0,1),X2 U{-1, 1) and Xi 

is stochastically independent of X 2 . Assume that (3 = (1,1) and e=\/l + X^^ where 
is a random variable independent of X and has distribution function E^{t) dehned 
as 

( Gi{t) if t G (—00, — 1 ] 

E^{t) = < r -I- cf if f G (—1,1] (A.3) 

[ G2{f) if t G ( 1 , 00) 
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where c G ( 0 , min{r, 1 — r}) is a fixed real constant, G\ and G 2 are continnous 
differentiable and strictly increasing fnnctions defined on the domains that inclnde 
the intervals (— 00 , — 1 ] and ( 1 , 00 ), respectively, and satisfy that 


G'i(—1) = r — c, lim Giit) = 0, ^ 2 ( 1 ) = n + c, and lim G2{t) = 1 . (A.4) 

t ^ — 00 t ^00 

Consider the valne b = (1, 0). Note that X'I 3 = Xi +X2 can take negative valne with 
positive probability bnt X' b = Xi is almost surely positive. It hence follows that 
6 ^ 0 by the definition fl2.6p . Moreover for s > 0, 


P{Y = l\X'b = s) 

(A.5) 

E [F^ ((1 + A2)-V2 (s + X 2 )) |Ai = s] 

(A. 6 ) 

J F^ [{1 + {s + u)) du/2 

(A.7) 

j F^{u{l + u'^)-^/^)du/2. 

(A. 8 ) 


Note that for each u G (—1,1), u{l + u^) ^igo falls within the interval (—1,1). 
Therefore by flA.3li . the term on the right hand side of flA.Sjl equals 

J [t + cu{l + du/2 = T. (A.9) 

Hence, sgn[X' 6 ] = sgn[P(y = l\X'b) — r] almost surely and we have that 6 G 0. 


Example 2: 0 can be a proper subset of 0 


Let A 
and 


(Xi,X 2 ,X 3 ) be a trivariate vector where Xi 


^3 


X 3,1 if Ai + A 2 > 0 
A 3 , 2 if Ai + A 2 < 0 


(A.IO) 


where A3,i t/(l, 2 ), A3, 2 ~ U(y —2,— 1 ) and the random variables Ai, A2, A3,i 
and A3,2 are independent. Assume that jd = ( 1 , 1 , 0 ) and where ^ is 

a random variable independent of A and has the same distribution function as 
defined by flA.Op . Consider the value b = (1,0,1). By design, X'(3 and X'b have the 
same sign almost surely and hence 6 G 0. Now consider the vector 7 = (1, 0, 0). Since 
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X'7 = -^ 1 ) by flA.Sp - flA.Sp and the arguments yielding the bound flA.9D in Example 
1, it also follows that 

P{Y = = s) > T for s > 0. 

Note that the event {X'b < 0 and Xi > 0} can occur with positive probability. There¬ 
fore we have that 6^0. 
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